Linguistic complexity: English vs. Polish, text vs. corpus

نویسندگان

  • Jaroslaw Kwapien
  • Stanislaw Drozdz
  • Adam Orczyk
چکیده

We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neurocognitive dimensions of lexical complexity in Polish.

Neuroimaging studies of English suggest that speech comprehension engages two interdependent systems: a bilateral fronto-temporal network responsible for general perceptual and cognitive processing, and a specialised left-lateralised network supporting specifically linguistic processing. Using fMRI we test this hypothesis in Polish, a Slavic language with rich and diverse morphology. We manipul...

متن کامل

The Inner Circle vs. the Outer Circle or British English vs. American English

In this paper, the use of two modals (can and may) in four varieties of English (British, India, Philippines, and USA) was compared and the characteristics of each variety were statistically analyzed. After all the sample sentences were extracted from each component of the ICE corpus, a total of twenty linguistic factors were encoded. Then, the collected data were statistically analyzed with R....

متن کامل

Cross-Linguistic Transfer or Target Language Proficiency: Writing Performance of Trilinguals vs. Bilinguals in Relation to the Interdependence Hypothesis

This study explored the nature of transfer among bilingual vs. trilinguals with varying levels of competence in English and their previous languages. The hypotheses were tested in writing tasks designed for 75 high (N= 35) vs. intermediate (N=40) proficient EFL learners with Turkish, Persian, English and Persian, English linguistic backgrounds. Qualitative data were also collected through some ...

متن کامل

Computational analysis to explore authors' depiction of characters

This study involves automatically identifying the sociolinguistic characteristics of fictional characters in plays by analyzing their written “speech”. We discuss three binary classification problems: predicting the characters’ gender (male vs. female), age (young vs. old), and socioeconomic standing (upper-middle class vs. lower class). The text corpus used is an annotated collection of August...

متن کامل

Lexicalization vs. Vocalization: A Cross-Linguistic Study of Emphasis in English and Persian

Language is a system of verbal elements that makes communication of meaningspossible in the manners the users intend by employing certain linguistic deviceswhich are partly language-specific. Once communicating cross-linguistically, thereis always a risk of negative transfer of techniques or processes from the firstlanguage (L1) to the foreign language (L2). The current study investigates the“e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1007.0936  شماره 

صفحات  -

تاریخ انتشار 2010